智能论文笔记

Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency

Weijie Ma , Ye Zhu , Ruimao Zhang , Jie Yang , Yiwen Hu , Zhen Li , Li Xiang

分类：计算机视觉

2022-06-23

大肠息肉分类是一项关键的临床检查。为了提高分类精度，大多数计算机辅助诊断算法通过采用窄带成像（NBI）识别结直肠息肉。但是，NBI通常在实际诊所场景中缺少利用率，因为该特定图像的获取需要在使用白光（WL）图像检测到息肉时手动切换光模式。为了避免上述情况，我们提出了一种新的方法，可以通过进行结构化的跨模式表示一致性直接实现准确的白光结肠镜图像分类。实际上，一对多模式图像，即NBI和WL，被送入共享变压器中以提取分层特征表示。然后，采用了一种新颖的设计空间注意模块（SAM）来计算从多层次的类令牌和贴片令牌％的相似性，以获得特定模态图像。通过将配对NBI和WL图像的类令牌和空间注意图对齐，变压器可以使上述两种模式保持全局和局部表示一致性。广泛的实验结果说明了所提出的方法的表现优于最近的研究，从而通过单个变压器实现了多模式预测，同时仅在使用WL图像时大大提高了分类精度。

translated by 谷歌翻译

2nd Place Solution for ICCV 2021 VIPriors Image Classification Challenge: An Attract-and-Repulse Learning Approach

Yilu Guo , Shicai Yang , Weijie Chen , Liang Ma , Di Xie , Shiliang Pu

分类：计算机视觉

2022-06-13

卷积神经网络（CNN）通过使用大型数据集在图像分类方面取得了重大成功。但是，在小规模数据集上从头开始学习，有效地有效地学习，这仍然是巨大的挑战。借助有限的培训数据集，类别的概念将是模棱两可的，因为过度参数化的CNN倾向于简单地记住数据集，从而导致概括能力差。因此，研究如何在避免过度拟合的同时学习更多的判别性表示至关重要。由于类别的概念往往是模棱两可的，因此获取更多个人信息很重要。因此，我们提出了一个新框架，称为“吸引和修复”，由对比度正规化（CR）组成以丰富特征表示形式，对称交叉熵（SCE），以平衡不同类别的拟合和平均教师以校准标签信息。具体而言，SCE和CR学习歧视性表示，同时通过班级信息（吸引）和实例（拒绝）之间的适应性权衡缓解过度构成。之后，平均教师通过校准更准确的软伪标签来进一步提高性能。足够的实验验证了吸引和修复框架的有效性。加上其他策略，例如积极的数据增强，tencrop推断和模型结合，我们在ICCV 2021 vipriors图像分类挑战中获得了第二名。

translated by 谷歌翻译

A Bidirectional Tree Tagging Scheme for Joint Medical Relation Extraction

Xukun Luo , Weijie Liu , Meng Ma , Ping Wang

分类：自然语言处理

2020-08-31

联合医疗关系提取是指由单个模型从医学文本中提取由实体和关系组成的三元组。解决方案之一是将此任务转换为顺序标记任务。但是，在现有的作品中，以线性方式表示和标记三元组的方法失败了，而将三元组组织为图形的方法面临着大量计算工作的挑战。在本文中，受到医学文本中类似树状的关系结构的启发，我们提出了一个名为“双向树”标签（BITT）的新颖方案，将医疗关系三元组成两条两条二进制树，并将树转换为单词级别的标签序列。基于BITT方案，我们开发了一个联合关系提取模型，以预测BITT标签并进一步提取医疗三元三元。我们的模型在两个医疗数据集上的最佳基准在F1分中优于2.0 \％和2.5 \％。更重要的是，我们的BITT方案的模型还可以在其他域的三个公共数据集中获得有希望的结果。

translated by 谷歌翻译

Informing selection of performance metrics for medical image segmentation evaluation using configurable synthetic errors

Shuyue Guan , Ravi K. Samala , Weijie Chen

分类：计算机视觉

2022-12-30

Machine learning-based segmentation in medical imaging is widely used in clinical applications from diagnostics to radiotherapy treatment planning. Segmented medical images with ground truth are useful for investigating the properties of different segmentation performance metrics to inform metric selection. Regular geometrical shapes are often used to synthesize segmentation errors and illustrate properties of performance metrics, but they lack the complexity of anatomical variations in real images. In this study, we present a tool to emulate segmentations by adjusting the reference (truth) masks of anatomical objects extracted from real medical images. Our tool is designed to modify the defined truth contours and emulate different types of segmentation errors with a set of user-configurable parameters. We defined the ground truth objects from 230 patient images in the Glioma Image Segmentation for Radiotherapy (GLIS-RT) database. For each object, we used our segmentation synthesis tool to synthesize 10 versions of segmentation (i.e., 10 simulated segmentors or algorithms), where each version has a pre-defined combination of segmentation errors. We then applied 20 performance metrics to evaluate all synthetic segmentations. We demonstrated the properties of these metrics, including their ability to capture specific types of segmentation errors. By analyzing the intrinsic properties of these metrics and categorizing the segmentation errors, we are working toward the goal of developing a decision-tree tool for assisting in the selection of segmentation performance metrics.

translated by 谷歌翻译

A Lightweight Reconstruction Network for Surface Defect Inspection

Chao Hu , Jian Yao , Weijie Wu , Weibin Qiu , Liqiang Zhu

分类：计算机视觉 | 机器学习

2022-12-25

Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.

translated by 谷歌翻译

TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities

Zhe Zhao , Yudong Li , Cheng Hou , Jing Zhao , Rong Tian , Weijie Liu , Yiren Chen , Ningyuan Sun , Haoyan Liu , Weiquan Mao

分类：自然语言处理

2022-12-13

Recently, the success of pre-training in text domain has been fully extended to vision, audio, and cross-modal scenarios. The proposed pre-training models of different modalities are showing a rising trend of homogeneity in their model structures, which brings the opportunity to implement different pre-training models within a uniform framework. In this paper, we present TencentPretrain, a toolkit supporting pre-training models of different modalities. The core feature of TencentPretrain is the modular design. The toolkit uniformly divides pre-training models into 5 components: embedding, encoder, target embedding, decoder, and target. As almost all of common modules are provided in each component, users can choose the desired modules from different components to build a complete pre-training model. The modular design enables users to efficiently reproduce existing pre-training models or build brand-new one. We test the toolkit on text, vision, and audio benchmarks and show that it can match the performance of the original implementations.

translated by 谷歌翻译

IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection

Chao Hu , Liqiang Zhu , Weibing Qiu , Weijie Wu

分类：计算机视觉

2022-12-03

Due to the lack of depth information of images and poor detection accuracy in monocular 3D object detection, we proposed the instance depth for multi-scale monocular 3D object detection method. Firstly, to enhance the model's processing ability for different scale targets, a multi-scale perception module based on dilated convolution is designed, and the depth features containing multi-scale information are re-refined from both spatial and channel directions considering the inconsistency between feature maps of different scales. Firstly, we designed a multi-scale perception module based on dilated convolution to enhance the model's processing ability for different scale targets. The depth features containing multi-scale information are re-refined from spatial and channel directions considering the inconsistency between feature maps of different scales. Secondly, so as to make the model obtain better 3D perception, this paper proposed to use the instance depth information as an auxiliary learning task to enhance the spatial depth feature of the 3D target and use the sparse instance depth to supervise the auxiliary task. Finally, by verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27\% in AP40 in the car category, effectively improving the detection performance of the monocular 3D object detection algorithm.

translated by 谷歌翻译

Runtime Analysis for the NSGA-II: Proving, Quantifying, and Explaining the Inefficiency For Many Objectives

Weijie Zheng , Benjamin Doerr

分类：神经与进化计算 | 人工智能

2022-11-23

The NSGA-II is one of the most prominent algorithms to solve multi-objective optimization problems. Despite numerous successful applications, several studies have shown that the NSGA-II is less effective for larger numbers of objectives. In this work, we use mathematical runtime analyses to rigorously demonstrate and quantify this phenomenon. We show that even on the simple OneMinMax benchmark, where every solution is Pareto optimal, the NSGA-II also with large population sizes cannot compute the full Pareto front (objective vectors of all Pareto optima) in sub-exponential time when the number of objectives is at least three. Our proofs suggest that the reason for this unexpected behavior lies in the fact that in the computation of the crowding distance, the different objectives are regarded independently. This is not a problem for two objectives, where any sorting of a pair-wise incomparable set of solutions according to one objective is also such a sorting according to the other objective (in the inverse order).

translated by 谷歌翻译

On Quantum Speedups for Nonconvex Optimization via Quantum Tunneling Walks

Yizhou Liu , Weijie J. Su , Tongyang Li

分类：机器学习

2022-09-29

经典算法通常对于解决非障碍最小值的非凸优化问题通常无效。在本文中，我们通过利用量子隧道的全局效应来探讨非凸优化的量子加速。具体而言，我们引入了一种称为量子隧道步行（QTW）的量子算法，并将其应用于局部最小值大约全局最小值的非凸问题。我们表明，当不同局部最小值较高但薄且最小值平坦时，QTW在经典随机梯度下降（SGD）上实现了量子加速。基于此观察结果，我们构建了一个特定的双孔景观，其中经典算法无法有效地击中一个目标，但是QTW可以在已知井附近提供适当的初始状态时可以很好地击中一个目标。最后，我们通过数值实验证实了我们的发现。

translated by 谷歌翻译

SAMP: A Toolkit for Model Inference with Self-Adaptive Mixed-Precision

Rong Tian , Zijing Zhao , Weijie Liu , Haoyan Liu , Weiquan Mao , Zhe Zhao , Kimmo Yan

分类：机器学习 | 自然语言处理

2022-09-19

最新的工业推理引擎（例如FASTRASTRANSFORMER1和TURBOTTRANSFORMER）已验证了半精度的浮点（FP16）和8位整数（INT8）量化可以极大地提高模型推断速度。但是，现有的FP16或INT8量化方法太复杂了，使用不当将大大导致性能损害。在本文中，我们开发了一个工具包，供用户轻松量化其模型以进行推理，其中提出了自适应混合精液（SAMP），以通过混合精确体系结构自动控制量化率，以平衡效率和性能。实验结果表明，我们的SAMP工具包比Pytorch和Fertransformer具有更高的速度，同时确保了所需的性能。此外，SAMP基于模块化设计，将令牌，嵌入，编码器和目标层解耦，该层允许用户处理各种下游任务，并且可以将其无缝集成到Pytorch中。

translated by 谷歌翻译